Efficient Image Compression using K-Means Clustering Algorithm
The objective of the project was to effectively compress an image using K-mean clustering Algorithm.The complete code of this project can be found in my github Page
K-means Clustering Algorithm
It’s an algorithm which takes multiple data points into consideration and finds similarities between these data points and it clusters them.The algorithm has to find patterns in these data points and has to come up with a classification.Its a classic example of an unsupervised learner
Working of the K-means Clustering Algorithm
- To start of with the K-means clustering algorithm takes two random points and plots it onto the above data points and these points are known as cluster centroids
- Once the algorithm decides the location of the cluster centroid it starts to scan each point in the data set and the points which are closest to the red coloured cluster centroid is shaded as red and similarly the points closest to the blue 158 cluster centroid is shaded as blue
- Once this is done the algorithm looks at all the points which are red and then takes the avg of all these points and moves the cluster centroid to that position. The same is done to the blue cluster centroid
- As observed earlier the k-means clustering algorithm has two functions: Function A-deciding the number of cluster centroids which have to be chosen Function B-Deciding the positing of the cluster centroid and also shade all the data points which are near to cluster centroid These two functions are repeated multiple times to get the right classification,
.png)
.png)
.png)
.png)
Methodology used to compresser an image
The image which has to be compressed is shown below

-
First, we import the necessary libraries:
- numpy for numerical operations
- sklearn.cluster.KMeans for performing k-means clustering
- matplotlib.pyplot for displaying the images
- PIL.Image for image manipulation
-
Next, we define the
compress_image
function that takes the path to an image file (image_path
) and the desired number of clusters (k
) as input. -
Inside the function, we start by loading the image using
Image.open(image_path)
and converting it to a NumPy array usingnp.array(image)
. This allows us to perform operations on the image using NumPy. -
We flatten the image array using
image_array.reshape(-1, 3)
. This converts the 2D image array into a 1D array where each row represents a pixel in the image and each pixel has three color channels (RGB). -
We then create an instance of the
KMeans
class withn_clusters=k
andrandom_state=0
. This sets up the k-means clustering algorithm with the desired number of clusters and a fixed random state for reproducibility. -
Next, we use the
fit_predict
method of theKMeans
object to perform clustering on the flattened image array (pixels
). This assigns a cluster label to each pixel based on its color. -
We obtain the cluster centers using
kmeans.cluster_centers_
. These represent the average color values for each cluster. -
To compress the image, we replace the color values of each pixel with the color values of its corresponding cluster center. This is done by creating a new array,
compressed_pixels
, where each pixel is replaced with its cluster center color. -
We reshape the
compressed_pixels
array back to the original image dimensions usingcompressed_pixels.reshape(image_array.shape)
. -
We create a compressed image from the reshaped array using
Image.fromarray(np.uint8(compressed_image_array))
. Thenp.uint8
conversion is necessary to ensure the pixel values are in the valid range for image representation. -
Next, we use
matplotlib.pyplot
to display the original and compressed images side by side. We create a figure with two subplots, where the first subplot shows the original image (image
) and the second subplot shows the compressed image (compressed_image
). We also set titles for each subplot and turn off the axis labels. Finally, we callplt.show()
to display the figure. -
Finally, we save the compressed image as
"compressed_image.jpg"
in the current directory usingcompressed_image.save("compressed_image.jpg")
.
Note on choosing the number of cluster centroids
In the code I had written to compress the image I had considered 15 cluster centroid points .As there are 3 primary colors,3 secondary colors and 6 tertiary colors
The choice of the number of cluster centroids (k) is a crucial decision in k-means clustering. It determines the level of compression and the quality of the compressed image. The selection of the optimal value for k depends on the specific requirements and trade-offs in terms of image quality and compression ratio.
Output
The below picture shows the compressed image

Future Scope of the Project
Image compression is widely used as a pre processing technique in computer vision and robotics for various applications. It enables efficient storage, transmission, and processing of visual data, reducing storage requirements and conserving bandwidth. Image compression is crucial for real-time video streaming, remote sensing, robot vision, embedded systems, medical imaging, and more. By compressing images, these fields benefit from improved performance, lower costs, and enhanced capabilities in resource-constrained environments.
References
- S. Ashwini, S. K. S. Veni, and B. Kavitha. "Image Compression using K-means Clustering" (2015).
- S. Saqib, K. Bashir, and K. Shahzad. "Image Compression Using K-means Clustering and Principle Component Analysis (PCA)" (2019).
- K. Sundari and A. Annadurai. "An Image Compression Technique using K-means Clustering Algorithm" (2014).
- M. Aravindh and N. R. Raajan. "Image Compression using K-means Clustering" (2016).
- A. Kaur and S. Kaur. "Digital Image Compression using K-means Clustering and Huffman Coding" (2015).